Skip to content

Conversation

nagkumar91
Copy link
Member

@nagkumar91 nagkumar91 commented Aug 12, 2025

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for reasoning models to evaluators by introducing an is_reasoning_model keyword parameter. When set, this parameter updates the evaluator configuration appropriately for reasoning models, enabling better integration with Azure OpenAI's reasoning capabilities.

Key Changes:

  • Added is_reasoning_model parameter to all evaluators' constructors
  • Updated QAEvaluator to propagate this parameter to child evaluators
  • Added defensive parameter checking in GroundednessEvaluator for backward compatibility
  • Updated documentation across evaluators to describe the new parameter

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
_similarity/_similarity.py Added is_reasoning_model parameter and updated docstrings
_retrieval/_retrieval.py Added is_reasoning_model parameter support
_response_completeness/_response_completeness.py Added is_reasoning_model parameter and improved formatting
_relevance/_relevance.py Added is_reasoning_model parameter support
_qa/_qa.py Updated to propagate is_reasoning_model to child evaluators
_groundedness/_groundedness.py Added parameter support with backward compatibility checks
_fluency/_fluency.py Added is_reasoning_model parameter and updated docstrings
_base_prompty_eval.py Updated to pass is_reasoning_model to AsyncPrompty.load
_base_multi_eval.py Minor import formatting improvement
_coherence/_coherence.py Added is_reasoning_model parameter and updated docstrings
CHANGELOG.md Documented the new feature and bug fix

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

@nagkumar91 nagkumar91 requested a review from Copilot September 24, 2025 15:15
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.


if is_reasoning_model:
parameters = configs.get("model", {}).get("parameters", {})
parameters = configs.get("model", {}).get("parameters") or {}
Copy link
Preview

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The expression or {} is redundant since dict.get() with a default empty dict already handles the None case. Consider simplifying to parameters = configs.get("model", {}).get("parameters", {})

Suggested change
parameters = configs.get("model", {}).get("parameters") or {}
parameters = configs.get("model", {}).get("parameters", {})

Copilot uses AI. Check for mistakes.

Comment on lines +89 to +96
if is_reasoning_model:
# Merge sanitized parameters into the override model dict so PF uses them
base_params = _extract_prompty_parameters(source)
sanitized = _sanitize_reasoning_parameters(base_params)
override_params = dict(model.get("parameters", {}) or {})
override_params.update(sanitized)
model["parameters"] = override_params
kwargs["model"] = model
Copy link
Preview

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reasoning model parameter sanitization code is unreachable because it's placed after the early return on line 69 when is_reasoning_model is True. The logic should be moved before the early return or the condition should be restructured.

Copilot uses AI. Check for mistakes.

Comment on lines +1032 to +1033
_use_run_submitter_client = cast(Optional[bool], evaluate_kwargs.pop("_use_run_submitter_client", None))
_use_pf_client = cast(Optional[bool], evaluate_kwargs.pop("_use_pf_client", None))
Copy link
Preview

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name is incorrect - should be kwargs.pop() instead of evaluate_kwargs.pop() since the parameter name is evaluate_kwargs but the function is operating on kwargs that was referenced in the original code.

Copilot uses AI. Check for mistakes.

import os, logging
import os
import logging
from inspect import signature
Copy link
Preview

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature import from inspect is not used in this file and should be removed to avoid unused imports.

Suggested change
from inspect import signature

Copilot uses AI. Check for mistakes.

Comment on lines +41 to +46
# Ensure we do not override prompty temperature/max_tokens in model parameters
# Only extra_headers should be present in parameters added by code
model_cfg = kwargs.get("model")
assert isinstance(model_cfg, dict)
params = model_cfg.get("parameters", {})
# Our code only sets extra_headers inside parameters; temperature/max_tokens come from prompty
Copy link
Preview

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment is misleading. The test verifies that temperature and max_tokens are NOT in the parameters dict, but the comment suggests they should be preserved from the prompty file. Consider clarifying that these parameters should be removed for reasoning models.

Suggested change
# Ensure we do not override prompty temperature/max_tokens in model parameters
# Only extra_headers should be present in parameters added by code
model_cfg = kwargs.get("model")
assert isinstance(model_cfg, dict)
params = model_cfg.get("parameters", {})
# Our code only sets extra_headers inside parameters; temperature/max_tokens come from prompty
# For reasoning models, temperature and max_tokens should be removed from parameters.
# Only extra_headers should be present in parameters added by code.
model_cfg = kwargs.get("model")
assert isinstance(model_cfg, dict)
params = model_cfg.get("parameters", {})
# Our code only sets extra_headers inside parameters; temperature/max_tokens are intentionally omitted for reasoning models.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Evaluation Issues related to the client library for Azure AI Evaluation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants